Research on Tibetan Text Orientation Identification

نویسندگان

  • Xiaodong Yan
  • Xiaobing Zhao
چکیده

In recent years, Minority languages in China are widely used on the computer and network. But now there is no effective public opinion analysis system of the minorities overall attitude of the masses of the hot events or topics. In this study, we research on Tibetan topic orientation recognition. First, according to the Tibetan context and life characteristics, combined with a set of emotional words in Hownet, the Tibetan emotional word dictionary is built, and then by the Tibetan word semantic similarity calculation method we extend this dictionary to get rich emotional word set. We also propose a method that the sentence orientation is determined by the orientation of words in this sentence and the orientation of text is determined by the orientation of sentences in this text. By our research the Tibetan hotspot information can be rapidly detected and found and then the public opinion tend can be track quickly. It is benefit for positive guidance of public opinion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation

In Tibetan, as words are written consecutively without delimiters, finding unknown word boundary is difficult. This paper presents a hybrid approach for Tibetan unknown word identification for offline corpus processing. Firstly, Tibetan named entity is preprocessed based on natural annotation. Secondly, other Tibetan unknown words are extracted from word segmentation fragments using MTC, the co...

متن کامل

Research on Tibetan Automatic Word Segmentation

This paper researches on Tibetan automatic word segmentation. We focus on three key technologies of Tibetan automatic word segmentation: (1) a Tibetan automatic word segmentation approach is proposed, which is taking the advantage of case-auxiliary words and continuous feature. (2) a resolution method of overlapping ambiguity in Tibetan word segmentation is proposed, which is based on forward-b...

متن کامل

Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation

Tibetan word segmentation is essential for Tibetan information processing. People mainly use the basic machine matching method which is based on dictionary to segment Tibetan words at present, because there is no segmented Tibetan corpus which can be used for training in Tibetan word segmentation. But the method based on dictionary is not fit to Tibetan number identification. This paper studies...

متن کامل

Tibetan Syllable-Based Functional Chunk Boundary Identification

Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. Accordin...

متن کامل

Tibetan Multi-word Expressions Identification Framework Based on News Corpora

This paper presents an identification framework for extracting Tibetan multi-word expressions. The framework includes two phases. In the first phase, sentences are segmented and high-frequency word-based n-grams are extracted using Nagao’s N-gram statistical algorithm and Statistical Substring Reduction Algorithm. In the second phase, the Tibetan MWEs are identified by the proposed framework wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014